Exponentiated Gradient versus Gradient Descent for Linear Predictors Produced as Part of the Esprit Working Group in Neural and Computational Learning, Neurocolt 8556
نویسندگان
چکیده
We consider two algorithm for on-line prediction based on a linear model. The algorithms are the well-known gradient descent (GD) algorithm and a new algorithm, which we call EG. They both maintain a weight vector using simple updates. For the GD algorithm, the update is based on subtracting the gradient of the squared error made on a prediction. The EG algorithm uses the components of the gradient in the exponents of factors that are used in updating the weight vector multi-plicatively. We present worst-case loss bounds for EG and compare them to previously known bounds for the GD algorithm. The bounds suggest that the losses of the algorithms are in general incomparable, but EG has a much smaller loss if only few components of the input are relevant for the predictions. We have performed experiments, which show that our worst-case upper bounds are quite tight already on simple artiicial data.
منابع مشابه
Perspectives of Current Research about the Complexity of Learning on Neural Nets Produced as Part of the Esprit Working Group in Neural and Computational Learning, Neurocolt 8556
متن کامل
Computing the Maximum Bichromatic Discrepancy, with Applications to Computer Graphics and Machine Learning Produced as Part of the Esprit Working Group in Neural and Computational Learning, Neurocolt 8556
Computing the maximum bichromatic discrepancy is an interesting theoretical problem with important applications in computational learning theory, computational geometry and computer graphics. In this paper we give algorithms to compute the maximum bichromatic discrepancy for simple geometric ranges, including rectangles and halfspaces. In addition, we give extensions to other discrepancy problems.
متن کاملDecision Trees Have Approximate Fingerprints Produced as Part of the Esprit Working Group in Neural and Computational Learning, Neurocolt 8556
We prove that decision trees exhibit the \approximate ngerprint" property, and therefore are not polynomially learnable using only equivalence queries. A slight modiication of the proof extends this result to several other representation classes of boolean concepts which have been studied in computational learning theory.
متن کاملProbabilistic Analysis of Learning in Artiicial Neural Networks: the Pac Model and Its Variants Produced as Part of the Esprit Working Group in Neural and Computational Learning, Neurocolt 8556
1 1 A version of this is to appear as a chapter in The Computational and Learning Complexity of Neural Networks (ed. Ian Parberry), MIT Press. 2 Abstract There are a number of mathematical approaches to the study of learning and generalization in artiicial neural networks. Here we survey thèprobably approximately correct' (PAC) model of learning and some of its variants. These models, much-stud...
متن کاملNeural Networks with Quadratic Vc Dimension Produced as Part of the Esprit Working Group in Neural and Computational Learning, Neurocolt 8556 Submitted to Workshop on Neural Information Processing, Nips'95
This paper shows that neural networks which use continuous activation functions have VC dimension at least as large as the square of the number of weights w. This result settles a long-standing open question, namely whether the well-known O(w log w) bound, known for hard-threshold nets, also held for more general sigmoidal nets. Implications for the number of samples needed for valid generaliza...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996